Region of interest-specific loss functions improve T2 quantification with ultrafast T2 mapping MRI sequences in knee, hip and lumbar spine

MRI T2 mapping sequences quantitatively assess tissue health and depict early degenerative changes in musculoskeletal (MSK) tissues like cartilage and intervertebral discs (IVDs) but require long acquisition times. In MSK imaging, small features in cartilage and IVDs are crucial for diagnoses and must be preserved when reconstructing accelerated data. To these ends, we propose region of interest-specific postprocessing of accelerated acquisitions: a recurrent UNet deep learning architecture that provides T2 maps in knee cartilage, hip cartilage, and lumbar spine IVDs from accelerated T2-prepared snapshot gradient-echo acquisitions, optimizing for cartilage and IVD performance with a multi-component loss function that most heavily penalizes errors in those regions. Quantification errors in knee and hip cartilage were under 10% and 9% from acceleration factors R = 2 through 10, respectively, with bias for both under 3 ms for most of R = 2 through 12. In IVDs, mean quantification errors were under 12% from R = 2 through 6. A Gray Level Co-Occurrence Matrix-based scheme showed knee and hip pipelines outperformed state-of-the-art models, retaining smooth textures for most R and sharper ones through moderate R. Our methodology yields robust T2 maps while offering new approaches for optimizing and evaluating reconstruction algorithms to facilitate better preservation of small, clinically relevant features.

aliasing artifacts in resulting images that must be removed through subsequent postprocessing. Some proposed approaches to these ends are reconstruction strategies such as parallel imaging (PI), compressed sensing (CS), model-based reconstructions, deep learning (DL), low-rank and sparse modeling methods, and MR Fingerprinting (MRF). Most of these approaches design an algorithm or exploit the redundancy of k-space acquisition across multiple coils to predict the appearance of the fully-sampled reconstructed image.
PI was one of the earliest techniques to accelerate MRI acquisition and has seen clinical adoption. Here, the redundancy of a multiple coil acquisition is leveraged to mitigate aliasing artifacts [18][19][20] , reducing clinical scan time up to acceleration factor R = 3 for MSK applications 21,22 . CS 23 has also shown promise, where aliased images are iteratively reconstructed by minimizing an objective function, retaining fidelity to acquired k-space and imposing sparsity on the reconstructed image in another domain. CS has attained clinically acceptable MSK image quality through roughly R = 4 21,22,24,25 , and up to R = 8 in research settings for knee cartilage T 1ρ mapping 26 . Similarly, PI and CS have also been applied sequentially (and simultaneously) for further acceleration 27 .
For cMRI acceleration, model-based reconstructions have gained traction, integrating the physics of T 2 /T 2 * decay and T 1 recovery into an objective function iteratively optimized to reconstruct maps, showing promise in brain and lumbar spine T 2 mapping [28][29][30] . More generally, incorporation of the physics of MRI parameter recovery/ decay has seen applications not just in model-based approaches, but in various aspects of other methodologies as well 31 . DL approaches have gained prominence in solving inverse problems such as reconstruction, allowing for cMRI reconstructions at higher R than other methods. Standalone DL approaches have seen promising results in knee MAPSS acceleration, T 1 mapping, and T 2 mapping sequences [32][33][34][35][36] . In other methodologies, DL has been integrated with model-based approaches while introducing loss functions to maintain fidelity to acquired k-space, seeing promise up to R = 8 in knee and brain T 1 and T 2 mapping [37][38][39] . DL has been applied to accelerate T 2 mapping in MR Fingerprinting, where DL can remove aliasing artifacts from undersampled acquisitions and/or replacing time-consuming dictionary lookup steps to predict MR parameter maps, and exploiting spatial correlations within maps to improve reconstructions 40,41 . Lastly, aside from DL, low-rank and sparse modeling methods have emerged as a means of accelerating acquisitions, where several MRI images acquired at different echo times are decomposed into temporal basis functions and spatial coefficients to model an MRI parameter, showing promise through R = 8 42 .
These works represent great progress, although avenues for improvement remain. Above all, these methods have optimized reconstructed images for full-volume performance; however, in MSK applications, clinical assessment relies on the inspection of precise anatomic features in specific anatomic regions, and consequently, the reconstruction quality cannot be compromised within these regions. Put differently, given clinical context, strong image quality may be most important in specific regions of an image, leaving room for algorithm optimization. Furthermore, most recent published approaches leverage k-space data in formal reconstruction approaches, but for niche applications such as region of interest (ROI)-focused optimization, such approaches may be outperformed by DL-based post-processing algorithms that denoise and fit undersampled T 2 -weighted images without using raw k-space. Moreover, performance of standard reconstruction algorithms is typically evaluated using metrics such as structural similarity index (SSIM), normalized root mean square error (NRMSE), and peak signal-to-noise ratio (PSNR), but recent works show these metrics may not provide the best correspondence with radiologist annotations 43,44 , leading other groups to propose alternate metrics to fill this niche 45 .
To these ends, this study proposes a recurrent UNet pipeline to postprocess undersampled coil-combined T 2 -weighted echo images, fitting and predicting T 2 maps from accelerated MAPSS acquisitions in the knee, hip and lumbar spine 46,47 . These algorithms are trained with multi-component, ROI-specific losses that optimize predicted maps for T 2 value and textural retention in cartilage and IVDs. In doing so, our approach allows for ROI-specific optimization, facilitating retention of small, crucial clinical features in tissues of interest while building on past applications of weighted loss functions for image processing tasks 48 .
To summarize, the contributions of this work are as follows: • By using a 4-component loss function in network training, we introduce the concept of "ROI-specific optimization" of cMRI accelerated acquisition pipelines. • We conduct a thorough ablation study of these 4 loss function components, proving the value of all in retaining textures in predicted maps while retaining high fidelity to ground truth T 2 values. • Acknowledging that standard evaluation metrics such as SSIM and NRMSE provide suboptimal sensitivity to clinically relevant metrics, we conduct a thorough Gray Level Co-Occurrence Matrix (GLCM)-metric-based analysis of smooth and sharp textural retention in predicted maps, with an eye towards better evaluation of retention of small features crucial to clinical diagnoses 49,50 . • We build on limited literature in hip and lumbar spine cMRI accelerated acquisition schemes by developing and evaluating our pipeline not only in knee cartilage, as several other works have done, but also for hip cartilage and lumbar spine IVD in ultrafast acquisitions. www.nature.com/scientificreports/ times (TSLs) for T 1ρ quantification, and three additional T 2 -prepared images for T 2 quantification (TSL = 0 ms images were shared for TE = 0 ms images). In this study, only T 2 -prepared images at four different TEs and corresponding T 2 maps from the MAPSS sequence were used. k y -k z space was acquired within an elliptical coverage (area = 0.7 compared to rectangular k y -k z , not acquiring corner space). Knee images were acquired from patients having ACL injuries, with scans taken at baseline and 3 years post-reconstruction. Hip images were acquired from patients having hip OA. Lumbar spine images were acquired from healthy subjects or patients with low back pain. Table 1 shows acquisition parameters.  Tables S2 and S13, with results on those splits described in Supplementary Tables S14, S15, S16. to corresponding TE = 0 ms images using a 3D rigid registration algorithm with a normalized mutual information criterion 54 . Levenberg-Marquardt fitting of registered T 2 weighted images yielded ground truth T 2 maps 55 .

Methods
To simulate accelerated acquisition, coil-combined T 2 weighted magnitude images after reconstruction (ARC for knee and hip) were Fourier transformed and retrospectively undersampled using a center-weighted Poisson disc pattern, fully sampling a central 5% square in k y -k z (R = 2, 3,4,6,8,10,12). Acquisition times associated with ground truth and accelerated MAPSS acquisitions in each body part can be found in Supplementary Table S1. As MAPSS acquires phase-encode lines with elliptical coverage in k y -k z (relative area of 0.7 compared to rectangular coverage), phase encoding lines solely within the sampling ellipse were undersampled. Although working with synthesized k-space data generated from coil-combined magnitude images, retrospective undersampling was done and R reported with respect to elliptical coverage in k y -k z to accurately simulate an actual undersampling pattern and not overstate model performance 56 . However, for hip acquisitions, reconstructed space outside the y-FOV had already been discarded; thus, simulating acquisitions with application of 'no phase wrap' was not possible and undersampling patterns would differ from those implemented on a scanner. T 2 weighted images from each echo time were undersampled with a unique pattern. For k y -k z lines not sampled at a given echo time, those k y -k z lines were initialized with the corresponding k y -k z from the image with the temporally closest echo time for which that k y -k z was sampled. Only k y -k z lines not sampled in images acquired at all echo times were zero-filled. k-Space was subsequently inverse Fourier transformed, yielding undersampled, aliased images. DL pipeline training. DL architecture. An overview of the data processing and training schemes is shown in Fig. 1, while a detailed diagram depicting our proposed network architecture is in Supplementary Fig. S1 ("Full Model"; 39,808,710 trainable parameters). Magnitude images from data undersampled as specified were fed into a recurrent UNet network. The network contains an initial recurrent portion: aliased images from each T 2 echo time have a 5-layer processing stream of 2D 3 × 3 convolutions with stride 1, yielding layers of depth 64, 128, 256, 512, and 1. Residual connections connect input aliased images with processing stream outputs. 2D 3 × 3 convolutions with stride 1 and residual connections transfer information between temporally adjacent corresponding hidden echo time processing layers with weighting parameter λ w = 0.2 57 . This soft-weighted view- Figure 1. Proposed pipeline. Experiments in proposed study entail generating ground truth T 2 maps from MAPSS, simulating accelerated acquisition of T 2 -weighted MAPSS images, and training a network to predict T 2 maps from undersampled images. (1) MAPSS contains 7 images, 3 that are T 2 weighted, 3 T 1ρ weighted, and 1 shared; the T 2 and shared image weightings are extracted, registered, and fitted slice-wise to yield ground truth T 2 maps. To simulate accelerated acquisition, each volume of coil-combined magnitude T 2 weighted images acquired at a given echo time are Fourier transformed, undersampled along the k y -k z plane with a centerweighted Poisson disc pattern, and inverse Fourier transformed to yield a simulated accelerated acquisition of a volume. Finally, undersampled T 2 weighted images acquired at all echo times for the same anatomic slice are concatenated and fed to the proposed recurrent UNet architecture, which predicts the T 2 map appearance for the slice. Training is done slice-wise with a multi-component loss function that includes a novel ROI-specific L 1 loss that optimizes predicted T 2 maps in cartilage and IVD ROIs, with other components that improve training stability and encourage retention of textures. www.nature.com/scientificreports/ sharing of neighboring T 2 weighted echo time images facilitated sharing of feature map information between temporally adjacent echo time images, which can augment sharing of k y -k z initializations to improve network image predictions. Outputs of all 4 echo time image processing streams were concatenated and fed to a UNet that predicted T 2 maps. 2D 3 × 3 convolutions with stride 2 were used for the encoder, and 2D 4 × 4 transpose convolutions with stride 2 for the decoder. Two additional architecture versions were also trained: one UNet with no recurrent portion ("No RNN"; 35,116,037 trainable parameters) and a second in which all layers apart from inputs to the recurrent portion and UNet had half the depth listed in Supplementary Fig. S1 ("Reduced Parameters"; 9,958,246 trainable parameters).
Loss function. Networks were trained with the multi-part loss function shown in Eq. (1): in which L L 1 is a scaled global L 1 loss detailed in Eq. (2): where T 2 represents ground truth T 2 , T 2 represents predicted T 2 , and S(x) is a translated and scaled sigmoid operator that assigns more weight to higher T 2 values. Sharp contrasts and high T 2 values can easily be lost in accelerated acquisition schemes, so S(x) proved useful through empirical testing in focusing networks to preserve these details. S(x) is defined below in Eq. (3): where x l , x h were the low and high T 2 value limits where the sigmoid operator weighting will transition from y l to y h . Parameters selected for the knee were as follows: x l = 0 ms, x h = 100 ms, y l = 0.1, y h = 1.0. In the hip: x l = 0 ms, x h = 60 ms, y l = 0.5, y h = 1.0. In the lumbar spine: x l = 0 ms, x h = 150 ms, y l = 0.25, y h = 1.0. A schematic of the operator that results from parameters of all three anatomies can be found as Supplementary Fig. S2. L L 1,φ is the ROI-specific L 1 loss, and is described in Eq. (4): where T 2,φ were ground truth T 2 values in the tissue of interest φ (IVD or cartilage), scaled by S(x) (Eq. (3)), and T 2,φ is the same for predicted T 2 . Pixels corresponding to φ are obtained from segmentation masks, the generation of which is described in "Training and Segmentation Details". For both L L 1 and L L 1,φ , L 1 norms were used instead of L 2 due to reduced sensitivity to outliers, leading to more stable trainings. L SSIM is an SSIM loss, described in Eq. (5): where SSIM was the structural similarity index between predicted and target maps. L Feature is a feature-based loss function designed to retain sharper textures, calculated as in Eq. (6): where VGG T 2 and VGGT 2 were the outputs of the 21st layer of a VGG-19 58 network pretrained on ImageNet when fed resized and normalized target and predicted T 2 maps, respectively. Maps were resized to 224 × 224 × 1, concatenated with themselves along the channel axis to yield 224 × 224 × 3 inputs, and normalized such that the channels had mean pixel values of 0.485, 0.456 and 0.406, with standard deviations of 0.229, 0.224, and 0.225, respectively. L 1 , L 1,φ , SSIM , Feature were loss component weightings. All were positive-valued and optimized through constrained random hyperparameter searches with the following ranges: Training and segmentation details. Scans of all three anatomies were split into training, validation and test sets as shown in Table 1. In the knee, cartilage was segmented manually. In the hip, cartilage was segmented manually for 4 central slices per volume. Segmentation in both was performed by research assistants trained by radiologists with over 20 years of experience. Since the hip dataset had substantially fewer segmented than unsegmented slices, the hip training set was bootstrapped to equalize the number of slices with and without segmentations (1068 bootstrapped slices). Finally, in the lumbar spine, IVDs were segmented with an ensemble of coarse-tofine context memory (CFCM) networks 59 . To calculate performance metrics and implement ROI-specific training losses, these segmentation masks were leveraged to identify pixels in tissues of interest (cartilage or IVD).
Signal values were scaled per slice for the middle 95% of pixel values to fall between 0 and 500 for the knee and lumbar spine, and 0 and 100 for the hip; these ranges were optimized empirically. During training, imaging volumes were augmented with random translation (± 10 pixels across phase and frequency directions) and random rotation (± 5 degrees about slice direction). All models were trained with learning rate 0.001 and Adam www.nature.com/scientificreports/ optimizer on an NVIDIA Titan Xp 12 GB GPU with batch size of 1 so the model would fit on a single GPU. Separate pipelines were trained for all 3 anatomies at R = 2, 3, 4, 6, 8, 10, and 12. For each pipeline, and at each trained R, a constrained random hyperparameter search was done for 15 iterations at 10 epochs per iteration to optimize L 1 , L 1,φ , SSIM , and Feature for visual fidelity of predicted maps to ground truth. Visual fidelity was assessed in the search using NRMSE (calculated as shown in Eq. (7)) and Pearson's r in the tissue of interest 60 .
Final pipelines across all anatomies and R were trained using optimized parameter sets until validation loss did not decrease for 10 epochs. Key training details are summarized as part of Table 1.

Experiments
Loss function ablation study. An ablation study is key to understand contributions of loss components.
Given optimized loss function weights, every combination of loss components was ablated and corresponding models were retrained until validation loss no longer decreased. "No RNN" and "Reduced Parameters" networks were also trained while maintaining loss function components at optimized values to assess the utility of simpler architectures. NRMSE and Pearson's correlation coefficient (r) were calculated in tissues of interest across the test set for original and ablated models to determine loss component contributions to performance. Pearson's r was deemed an appropriate statistical test for this and subsequent experiments, as it is useful in assessing the linear relationship between related pairs of interval data. While no formal NRMSE test was done, it nonetheless allows for quantitative assessment of T 2 quantification quality and easy comparison with results from other approaches. NRMSE is reported ± 1 standard deviation (s.d.); Pearson's r was deemed significant in accordance with corresponding P values, α = 0.001, 0.01, and 0.05. NRMSEs within tissues of interest of a given scan were also multiplied by mean T 2 values within the tissue of interest of that patient, generating T 2 value equivalents of error rates.
To more specifically evaluate the utility of the ROI-specific loss component, two loss function configurations from the ablation study were further analyzed at all R: no ROI-specific loss component ( L 1,φ = 0; L 1 , SSIM , Feature � = 0 ) and no ROI-specific or feature-based components ( L 1,φ , Feature = 0; L 1 , SSIM � = 0 ). These models were intended to represent baselines in which all loss functions were preserved except the ROI-specific component, and a standard reconstruction loss function of pixel and SSIM-based loss components, respectively. Pearson's r-evaluated in tissues of interest and globally-was calculated to determine the degree and significance of correlation between predicted maps and ground truth, both globally and within tissues of interest, α = 0.001, 0.01, and 0.05. To apply CS reconstruction, original MAPSS T 2 -prepared images were Fourier transformed into coil-combined k-space, 1D-inverse Fourier transformed along the readout direction, and individual slices in k y − k z reconstructed using an L 1 wavelet-based algorithm with regularization coefficient 0.001 61 . CS reconstructed images were registered to the TE = 0 ms echo time image using a 3D rigid registration algorithm with a normalized mutual information criterion and fitted using Levenberg-Marquardt fitting to yield T 2 maps. Performance of these approaches and our proposed methods was evaluated through the following: Comparison of global and ROI-specific performance. To test for completeness of training, performance of our proposed pipelines was compared against state-of-the-art models that did not use ROI-specific components in predicting T 2 maps. Pearson's r (α = 0.001, 0.01, and 0.05) was used to compare model performances and assess strength of correlations to ground truth T 2 .
Standard reconstruction metrics. Performance was reported in tissues of interest with standard reconstruction metrics: NRMSE (mean ± 1 s.d.) and Pearson's r (α = 0.001, 0.01, and 0.05). NRMSEs were also converted into T 2 value equivalents by tissue compartment as in the ablation study. T 2 value retention. Fidelity of predicted maps to ground truth T 2 was also assessed. First, predicted and ground truth T 2 values were compared across tissues of interest within the test set (mean ± 1 s.d.), generating violin plots for all three anatomies with overlaid boxplots for T 2 value distribution comparison. T 2 agreement was also assessed through Bland-Altman analysis.
Texture retention. Gray Level Co-Occurrence Matrix (GLCM) 62 metrics were used to assess texture retention within tissues of interest. GLCM contrast and dissimilarity are maximized by large local pixel value changes and thus by sharper textures. GLCM homogeneity is maximized by small local pixel value changes, while GLCM energy and angular second moment (ASM) are maximized by few total pixel values within an image; hence, all three are maximized by smoothness. For each anatomy and R, we calculated these texture metrics at 4 orientations (θ = 0°, 45°, 90° and 135°; d = 1 pixel) and averaged across all orientations. Finally, we calculated intraclass correlation coefficients (ICCs) for all metrics with respect to ground truth (two-way mixed effects, single rater 63 ) www.nature.com/scientificreports/ and reported 95% ICC confidence intervals (α = 0.001, 0.01, and 0.05). These tests were chosen as appropriate, as they assess both reliability and agreement of associated metrics, and in this use case, individual GLCM metric values themselves are considered the only rater, justifying the ICC test type selected.

Repeatability study.
To assess the robustness of pipelines to different datasets, two additional splits of the knee, hip and spine datasets were made, ensuring no patient was part of multiple validation and/or test datasets and that all scans from a given patient were only in one of training, validation and test for each split (folds 2 and 3 in Supplementary Table S2, where fold 1 is the original split). Additional hyperparameters searches optimized loss function weights on the two new splits. Optimized loss weights and corresponding T 2 quantification and texture retention performance for each splits is presented at all tested R in the same manner as for the primary split.
Raw multicoil data assessment. An in-house pipeline was developed that leveraged GE Orchestra 1.10 and other postprocessing tools to reconstruct coil-combined images from raw k-space data. As a proof of concept, knee MAPSS scans were performed on 3 volunteers, hip scans for 2, and lumbar spine for 2, all using the acquisition parameters listed for the retrospective datasets used for algorithm training, with raw k-space data saved for all. Multicoil k-space data (after ARC for knee and hip) was undersampled with the same center-weighted Poisson disc pattern described earlier, with each coil seeing the same undersampling pattern and k y -k z lines being shared across different T 2 weighted echo time k-spaces as previously described. Coil-combined images resulting from undersampled multi-coil data at all tested R were fed through corresponding post-processing pipelines to predict T 2 map appearance. A radiologist with 2 years of experience segmented knee cartilage, hip cartilage, and intervertebral discs from these acquisitions, allowing for visualizations of predicted T 2 maps and NRMSE calculations in ROIs.

Results
Ablation study results. Voxel-wise performance metrics for ablation study models at R = 8 are shown in Supplementary ROI-specific and global assessments of best models and corresponding models trained without an ROIspecific loss (λ 1,ϕ = 0) and models trained with a generic loss (λ 1,ϕ = 0, λ Feat = 0) are shown in Supplementary Table S5. In the knee and hip, across nearly all R, ROI-specific loss addition leads to improved correlations between predicted and ground truth cartilage T 2 , with diminished performance globally. In the lumbar spine, which was trained with a substantially fewer batches than the knee and hip pipelines, these trends were inconsistent across tested R. Example predictions and ground truth for one slice of a patient in each pipeline are shown in Supplementary Fig. S3, showing that patterns of local T 2 value elevations in cartilage and IVDs are better preserved with an ROI-specific loss as opposed to pipelines trained without the loss component.

Visuals of network performance and comparison with state-of-the-art models. Predicted T 2
maps are displayed at select R for knee, hip and lumbar spine models in Fig. 2 for our three pipelines and three methods from the literature. In knee, hip, and lumbar spine, T 2 quantification performance is strongest with our proposed methods, maintaining low error rates, showing promising results compared with state-of-theart methods through R = 10. Optimal architecture performances are further explored in Figs. 3-5. As shown in Fig. 3a, predicted T 2 knee maps retained strong fidelity to ground truth within tibiofemoral joint cartilage. Patterns within predicted maps became slightly more diffuse as R increased to 10, as indicated by a slight rise in NRMSE for cartilage in the slice, but visually, T 2 values and map patterns are preserved. As seen in Fig. 4a, hip predicted maps preserve T 2 values well in femoral and acetabular cartilage through R = 10, although T 2 patterns become more diffuse by R = 10. Figure 5a shows T 2 map predictions in the lumbar spine. The L4-L5 IVD is shown in more detail, where T 2 quantification performance was acceptable at R = 3, moderate at R = 6, and worse at R = 10, as indicated by rising IVD NRMSEs.
ROI and global performance comparisons of our selected pipelines against state-of-the-art approaches are in Supplementary Table S5. Across piplines trained with relatively large dataset (knee and hip), DL and model-based approaches (MANTIS and MANTIS-GAN) outperformed our proposed pipeline globally, but within cartilage ROIs, our pipeline exhibited stronger Pearson's r at each tested R. These trends were not as strong in the lumbar spine pipelines, possibly owing to the randomness of training with a smaller dataset. Global and ROI-specific T 2 predictions are further visualized in Supplementary Fig. S4, showing predicted T 2 values exhibit substantially more visual fidelity to ground truth and lower NRMSE in state-of-the-art models compared to our pipeline, but a reversal of that trend in cartilage. In the lumbar spine, at some but not all R, those trends held, yielding similar conclusions to the Pearson's r analysis.

Evaluation of T 2 quantification performance and comparison with state-of-the-art models. Voxel-wise T 2 evaluation fidelity. Pearson's r and NRMSE across all anatomies and R for our approaches
and state-of-the-art methods are in Table 2 Supplementary Tables S7 and S8. For the full model, across all cartilage compartments, T 2 estimation errors remained under 10% through R = 10 across all cartilage compartments while Pearson's r ranged from 0.748 at R = 2 to 0.491 at R = 12, indicating strong correlations 64 between predictions and ground truth at R = 2 and moderate correlations through R = 12. For some cartilage compartments and R, performance was stronger in the No RNN pipeline. Interestingly, quantification performance was strongest in patellofemoral joint cartilage, generally exhibiting lower NRMSE and stronger correlations. Our ROI-specific loss pipelines outperformed state-of-the-art models in each cartilage compartment.
Supplementary Tables S9 and S10 show hip T 2 quantification performance across cartilage compartments. As in the knee, quantification performance was strong, with error rates across all cartilage under 9% through R = 12 for the no RNN and full model pipelines. While the no RNN pipeline had stronger quantification errors, the full model had higher Pearson's r, which ranged from 0.794 at R = 2 to 0.517 at R = 12, showing strong correlations between predictions and ground truth through R = 3 and moderate correlations through R = 12. T 2 quantification performance was slightly stronger in femoral than acetabular cartilage. Our pipelines again outperformed state-of-the-art models in each cartilage compartment.
Supplementary Tables S11 and S12 show lumbar spine T 2 quantification performance, which was mixed. Pearson's r across all discs was very high, ranging from 0.884 at R = 2 to 0.643 at R = 12 for the no RNN model, indicating strong correlations through R = 8 and moderate correlations through R = 12 to ground truth. That said, IVD error rates were markedly higher across all R than in hip and knee cartilage, ranging from 4.86% to 18.8%. Though there was some volatility, error rates and Pearson's r generally showed poorest T 2 quantification in L1/ Figure 3. T 2 quantification performance of optimal ROI-specific pipeline in knee cartilage. (a) Visual pipeline performance within the knee for a representative patient, with corresponding NRMSEs for cartilage in the predicted T 2 map slice. Performance remains strong through R = 10, maintaining T 2 patterns in the medial tibiofemoral cartilage, indicating pipeline utility. Predicted maps generated by the network are masked using a cartilage segmentation mask and superimposed on the ground truth, fully sampled TE = 0 ms MAPSS echo time image. (b) Bland-Altman plots for all scans within test set for which multiclass cartilage compartment segmentations were available (n = 16, 6 cartilage compartments for each). Predicted T 2 values demonstrate minimal bias and tight limits of agreement across most tested R, with best performance coming from patellofemoral cartilage. www.nature.com/scientificreports/ L2 and L2/L3 discs. Through R = 8, ROI-specific loss pipelines outperformed state-of-the-art models at nearly all disc levels, with stronger Pearson's r in most IVD levels through R = 12.

T 2 Value retention on region of interest averages.
Bland-Altman plots are provided for the knee, hip and lumbar spine in Figs. 3b, 4b, and 5b. In knee and hip, T 2 values are predicted with minimal bias with respect to ground truth. The ± 1.96 s.d. limits of agreement were less than approximately ± 6 ms with mean biases under ± 3 ms through R = 8 for knee cartilage (Fig. 3b). Among cartilage compartments, predictions in trochlear and patellar cartilage showed the least bias, while tibiofemoral cartilage T 2 was generally slightly overestimated. In the hip (Fig. 4b), ± 1.96 s.d. limits of agreement were less than approximately ± 5 ms with mean biases under ± 3 ms through R = 12, although T 2 quantification performance was similar across femoral and acetabular cartilage. In the lumbar spine (Fig. 5b), limits of agreement were considerably wider than the hip and knee pipelines, particularly above R = 4. While the line of equality was contained in these limits at all R, spine pipelines generally overestimated T 2 values. While at some particular R, a disc level saw poorer T 2 quantification than others (i.e. L2/L3 at R = 6), on balance, predicted maps yielded similar bias and error across all discs. Supplementary Fig. S5 shows T 2 value distributions in violin and boxplots. Plots reveal minimal bias in hip cartilage predicted T 2 maps and slight but limited bias towards overestimating T 2 in knee cartilage. In the lumbar spine, more volatility was observed in predicted T 2 distributions, likely due to small test set size (n = 5), but at least through R = 6, these deviations had limited magnitude.
Texture retention. ICCs ± 1 s.d. for GLCM metrics are in Table 3 for our best performing pipelines: no RNN and full model. In knee cartilage, ICCs showed significant correlations between predicted and ground truth  Repeatability study. Optimal loss weightings from hyperparameter searches on the two additional splits are in Supplementary Table S13. Results of trainings on additional splits in T 2 quantification error, Pearson's r, and texture metrics are in Supplementary Tables S14, S15, S16. In the knee and hip pipelines, experiments show comparable results across all folds for these metrics. In the lumbar spine, Pearson's r exhibited similar values across all folds, but in some cases, mean texture metric ICCs and NRMSEs exhibited substantial differences. However, confidence intervals were very wide for ICCs and NRMSEs in the lumbar spine, likely due to limited test set size (n = 5). (a) Visual pipeline performance within the lumbar spine IVDs for a representative patient, with corresponding NRMSEs for IVDs in the predicted T 2 map slice. Predicted maps are masked using an IVD segmentation mask and superimposed on the ground truth, fully sampled TE = 0 ms MAPSS echo time image. Network performance is best through R = 6, after which local T 2 elevations are diffuse and underestimated. (b) Bland-Altman plots for all scans within test set (n = 5, 5 IVDs plotted for each if segmentation of disc available). T 2 value predictions reflect some bias and fairly wide limits of agreement, particularly above R = 4. These results indicate progress but the need for improvement. Smaller lumbar spine dataset and test set size are likely responsible for poorer model when compared to hip and knee performance, as well as the relatively smaller number of slices in k z , which exacerbates undersampling effects.
Scientific Reports | (2022) 12:22208 | https://doi.org/10.1038/s41598-022-26266-z www.nature.com/scientificreports/ Table 2. ROI-specific model performance in standard metrics from R = 2 through R = 12. Top performing pipeline for each metric, at each R, is shown in bold. Performances of pipelines trained with ROI-specific losses and other state-of-the-art methods in T 2 quantification error rates in knee cartilage, hip cartilage, and lumbar spine IVDs. NRMSEs are reported ± 1 s.d., and Pearson's r is reported with significances as follows: *P < 0.05, **P < 0.01, ***P < 0.001 (knee: n = 90; hip: n = 15; lumbar spine: n = 5). Across all anatomies, performances were strongest in ROI-specific loss pipelines (Full Model, Reduced Parameters, and No RNN): in the knee, the No RNN and Full Model pipelines particularly excelled across all tested R; in the hip, the No RNN pipeline was strong in maintaining minimal T 2 quantification errors, while the Full Model and Reduced Parameters models had strongest correlations between predicted maps and ground truth; in the lumbar spine, the No RNN pipeline especially had strong T 2 quantification performance. Performance in the knee and hip pipelines is strong and below clinically significant T 2 changes across nearly all tested R, while Pearson's r indicates strong T 2 value preservation in the lumbar spine through R = 6. T 2 quantification performance is thus promising in all three pipelines, but particularly for the knee and hip.  www.nature.com/scientificreports/ Raw multicoil data assessment. Supplementary Fig. S6 shows T 2 maps predicted from our proposed pipelines on retrospectively undersampled raw k-space data. In the knee, T 2 quantification errors were low through R = 12, with local T 2 elevations preserved and little dip in performance compared to corresponding retrospectively undersampled coil-combined knee data. In the hip, T 2 quantification errors were low, with local T 2 elevations reproduced at most R; while performance at higher R matched expected performance from coilcombined experiments, lower R quantification errors were slightly higher. Performance was more volatile in the lumbar spine, where through R = 4, T 2 quantification errors matched expected results and local T 2 patterns were generally preserved, but performance degraded substantially above R = 4.

Discussion and conclusions.
In this work, we present data-driven pipelines that leverage recurrent UNet architectures and multi-component losses to accelerate MAPSS T 2 mapping for anatomies where a subset of tissues is of particular clinical interest. By image processing and standard reconstruction metrics, through R = 10, our knee pipelines retained fidelity to T 2 values with tight limits of agreement, preserving smooth textures with good to excellent reliability and sharper ones with moderate reliability for most tested R. While the no RNN pipeline delivered lower NRMSEs and higher Pearson's r across many cartilage compartments and R than full model, its texture retention was poorer, making the full model better suited to preserve small, key diagnostic features. In hip cartilage, predicted maps retained T 2 fidelity through R = 12 with tight limits of agreement, preserved smooth textures with good to excellent agreement across tested R, and maintained sharper textures at low to moderate R. As with the knee, texture retention was strongest in the full pipeline despite lower no RNN NRMSEs. In IVDs, the no RNN pipeline delivered best standard reconstruction metric and texture retention performance. Despite maintaining smoother textures with moderate to excellent agreement across tested R and preserving sharper textures at lower R, the IVD pipeline revealed biases and fairly wide limits of agreement in T 2 preservation, particularly at R = 6 and higher. When assessed on retrospectively undersampled multicoil raw k-space data, the knee and hip pipelines saw minimal degradation in performance as compared to results from images undersampled via synthetic k-space, whereas the lumbar spine pipeline exhibited similar performance through R = 4. Furthermore, repeatability studies indicated that, particularly for the hip and knee, performance was stable with respect to datasets. All told, these metrics indicate promise for the knee and hip pipelines in MAPSS T 2 mapping acceleration, and progress but room for improvement in IVDs.
Assessments of ROI-specific loss component utility showed its potential for improving predictions in accelerated acquisition schemes. When trained with sufficiently large datasets, as our knee and hip pipelines were, its inclusion saw stronger fidelity to local T 2 patterns in cartilage ROIs and reduced T 2 quantification errors compared to analogous pipelines trained without the ROI-specific loss component. Compared to state-of-the-art DL pipelines, knee and hip pipelines saw improved Pearson's r in cartilage ROIs but poorer global Pearson's r, as expected from the focused training approach. Interestingly, CS approaches exhibit relatively strong NRMSEs while generating relatively smooth predicted T 2 maps; this is possibly because in training, DL-based approaches simultaneously removed aliasing artifacts and performed T 2 fitting, and could attempt to preserve finer details than a CS approach performing those steps sequentially. While our approaches outperformed state-of-the-art methods at many R and tissue compartments in the lumbar spine, global Pearson's r indicated this may have been partially due to some models being more completely trained than others. These results may have been different with a larger lumbar spine training set. Nonetheless, the value of ROI-specific loss functions in accelerated acquisition pipelines is clear: with sufficiently large datasets, they can optimize for ROIs and outperform state-ofthe-art approaches at high R, as existing approaches are optimized for global and not ROI-specific performance.
We can contextualize performance by comparing quantification errors to clinically significant T 2 changes. In the knee, T 2 increases 13.4% in lateral femoral condyle (LFC) cartilage, 12.3% in medical femoral condyle (MFC) cartilage, and 8.1% in medial tibial condyle (MTC) cartilage among patients with mild OA compared to controls 65 . Our top-performing knee pipeline saw errors below this benchmark through R = 12 in the LFC and at R = 2 in the MTC. In IVDs, T 2 decreases 36.3% in the nucleus pulposus and 24.2% in the annulus fibrosus from healthy to degenerative discs 66 . Our top-performing pipeline saw quantification errors for each disc below the more stringent 24.2% through R = 12. In the hip, T 2 values among healthy patients that progress to OA within 18 months are 7.3% higher in femoral and 5.2% higher in acetabular cartilage compared to controls 67 . Our top-performing hip pipeline had errors below these benchmarks at all R in femoral cartilage and at R = 2 in acetabular cartilage. Clinical metrics thus depict promise for pipelines in all three anatomies in maintaining sub-clinical-significance quantification errors.
Clinical and standard metrics show knee and hip pipeline performances to be particularly promising-the T 2 values, map texture preservation, and error rates relative to clinical benchmarks all mark meaningful progress towards reducing cMRI acquisition time for eventual clinical use. That said, while lumbar spine performance was strong by clinical metrics, it lagged the knee and hip by standard reconstruction metrics. One explanation is dataset size: the lumbar spine dataset had substantially fewer scans and imaging slices than the knee and hip. This has twofold impact: (1) the strength of a model trained from a smaller dataset is inherently limited, and (2) having only 5 test set scans limits statistical power and induces wide standard deviations of metrics, preventing significant conclusions from being reached. The effects of this small dataset size particularly surface in repeatability studies. Furthermore, lumbar spine acquisitions were more susceptible to breathing artifacts and had fewer slices than the hip and knee; undersampling therefore left fewer lumbar spine k y -k z lines sampled compared to the hip and knee, inducing worse initializations and possibly poorer performance. Nonetheless, to our knowledge, this is the first DL application to accelerate lumbar spine cMRI, marking progress that must be furthered with additional data procurement and algorithm development for clinical utility.
The GLCM-based textural retention evaluation demonstrated a framework through which reconstruction performance can be better evaluated than through standard metrics like SSIM, NRMSE, and PSNR. ICCs of Scientific Reports | (2022) 12:22208 | https://doi.org/10.1038/s41598-022-26266-z www.nature.com/scientificreports/ GLCM metrics between predicted and ground truth T 2 maps allow for intuitive, scaled measurements that can reflect how well a particular texture was preserved: for example, visual inspection of predicted T 2 maps in knee and hip cartilage in Figs. 3, 4 indicate that sharp textures are preserved better by the hip pipeline. This qualitative observation is confirmed by the GLCM Dissimilarity ICCs observed for the full model in the hip and knee pipelines in Table 3 at several tested R. This work could be furthered by extending this analysis to additional GLCM metrics for an even more thorough assessment of textural feature retention. Additional future improvements could also include pre-processing cartilage and IVD tissues prior to GLCM metric calculation to improve stability of these metrics, as other groups have started to do 68 . Moreover, by showing results at 7 acceleration factors instead of the 2-3 typical in the literature, we found performance did not always degrade steadily as R increased. Networks therefore may be sensitive not just to general undersampling patterns, but also the specific nature of the pattern. Thus, when future DL reconstruction pipelines are trained, a library of undersampling patterns may be advisable to encourage robustness to sampling patterns 69 .
This study has limitations. First, we used retrospectively undersampled coil-combined magnitude echo time images that, in the knee and hip, had undergone ARC processing in their reconstruction, with 4 edge slices discarded for all data. Due to coil combination and post-processing, the k-space being undersampled would not match the acquisition's multi-coil k-space. Additionally, while we undersampled the MAPSS acquisition ellipse for each anatomy, the hip acquisitions had 'no phase wrap' applied, meaning that tested undersampling patterns would differ from those implemented on the scanner. While our raw k-space experiments show performance degradation was limited compared to coil-combined magnitude image experiments, models would be stronger if trained with a similarly sized multicoil k-space dataset. Second, this network is specific to our sampling patterns and acquisition parameters, and new pipelines would need to be trained should parameters like MAPSS T 2 echo times be substantially changed. Finally, the lumbar spine dataset size is rather small, limiting the power of conclusions.
To conclude, this study shows a novel means of training DL pipelines to accelerate cMRI in anatomies where specific tissues are of heightened clinical importance. In knee and hip, pipelines were effective at high R in maintaining textures, keeping fidelity to T 2 values, and minimizing T 2 quantification errors, whereas in the lumbar spine, the pipeline performed reasonably by those same criteria, but poorer in T 2 value fidelity and quantification errors. This reflects progress towards clinically useful pipelines that specialize in MSK T 2 mapping. The GLCMbased textural retention analysis elucidates an alternate to standard reconstruction metrics, allowing for intuitive measures of the types of features best preserved by a accelerated acquisition schemes, potentially allowing for better quantitative assessment of model performance. Future directions include multicoil k-space training, simultaneous MAPSS T 1ρ and T 2 acceleration, and temporal undersampling of T 2 weighted echo time images.

Data availability
The datasets analyzed during the current study have been collected as part of multi-year studies or volunteers scans at the UCSF and their public release is not currently possible due to data privacy concerns. Codes to reproduce the results of this work are available upon reasonable request from the corresponding author (A. Tolpadi).